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Recently, it has been shown that every recursively enumerable language can be generated by a scat- 
tered context grammar with no more than three nonterminals. However, in that construction, the 
maximal number of nonterminals simultaneously rewritten during a derivation step depends on many 
factors, such as the cardinality of the alphabet of the generated language and the structure of the 
generated language itself. This paper improves the result by showing that the maximal number of 
nonterminals simultaneously rewritten during any derivation step can be limited by a small constant 
regardless of other factors. 

1 Introduction 

Scattered context grammars, introduced by Greibach and Hopcroft in O, are partially parallel rewriting 
devices based on context-free productions, where in each derivation step, a finite number of nonterminal 
symbols of the current sentential form is simultaneously rewritten. As scattered context grammars were 
originally defined without erasing productions, it is no surprise that they generate only context sensitive 
languages. On the other hand, however, the question of whether every context sensitive language can 
be generated by a (nonerasing) scattered context grammar is an interesting, longstanding open problem. 
Note that the natural generalization of these grammars allowing erasing productions makes them compu- 
tationally complete (see @). For some conditions when a scattered context grammar can be transformed 
to an equivalent nonerasing scattered context grammar, the reader is referred to @. In what follows, we 
implicitly consider scattered context grammars with erasing productions. 

Although many interesting results have been achieved in the area of the descriptional complexity of 
scattered context grammars during the last few decades, the main motivation to re-open this investigation 
area comes from an interesting, recently started research project on bulding parsers and compilers of 
programming languages making use of advantages of scattered context grammars (see, for instance, 
papers (3l[T0l for more information on the advantages and problems arising from this approach). 

To give an insight into the descriptional complexity of scattered context grammars (including erasing 
productions), note that it is proved in [8] that one-nonterminal scattered context grammars are not pow- 
erful enough to generate all context sensitive languages so that it is demonsrated that they are not able 
to generate the language {a : n > 0} (which is scattered context, see Lemma [2] below). In addition, 
although they are not able to generate all these languages, it is an open problem (because of the erasing 
productions) whether they can generate a language which is not context sensitive. On the other hand, it 
is proved in |7] that three nonterminals are sufficient enough for scattered context grammars to charac- 
terize the family of recursively enumerable languages. In that proof, however, the maximal number of 
nonterminal symbols simultaneously rewritten during any derivation step depends on the alphabet of the 
generated language and on the structure of the generated language itself. 
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Later, in Ifl2l . Vaszil gave another construction limiting the maximal number of nonterminals simul- 
taneously rewritten during one derivation step. However, this improvement is for the price of increasing 
the number of nonterminals. Although Vaszil's construction has been improved since then (in the sense 
of the number of nonterminals, see for an overview of the latest results), the number of three nonter- 
minals has not been achieved. 

This paper presents a construction improving the descriptional complexity of scattered context gram- 
mars with three nonterminals by limiting the maximal number of nonterminals simultaneously rewritten 
during any derivation step regardless of any other factors. This result is achieved by the combination of 
approaches of both previously mentioned papers. Specifically, this paper proves that every recursively 
enumerable language is generated by a three-nonterminal scattered context grammar, where no more 
than nine symbols are simultaneously rewritten during any derivation step. This is a significant improve- 
ment in comparison with the result of Q, where more than 2n + 4 symbols have to be simultaneously 
rewritten during almost all derivation steps of any successful derivation, for some n strictly greater than 
the number of terminal symbols of the generated language plus two. To be more precise, n strongly 
depends not only on the terminal alphabet of the generated language, but also on the structure of the 
generated language itself. 

Finally, note that analogously as in [7], we do not give a constant limit on the number of non- 
context-free productions, which is also limited by fixed constants in [12] and [5]. To find such a limit is 
an interesting challenge for the future research, as well as to find out whether the number of nonterminals 
can be reduced to two. See also the overview of known results and open problems in the conclusion. 

2 Preliminaries and definitions 

We assume that the reader is familiar with formal language theory (see ifTTTO . For an alphabet (finite 
nonempty set) V, V* represents the free monoid generated by V with the unit denoted by A. Set V + = 
V* — {A}. For w € V* and a<EV, let \ w\, \w\ a , and w R denote the length of w, the number of occurrences 
of a in w, and the mirror image of w, respectively. 

A scattered context grammar is a quadruple G = (N,T,P,S), where N is the alphabet of nonter- 
minals, T is the alphabet of terminals such that iV n T = 0, S £ N is the start symbol, and P is a finite 
set of productions of the form (Ai, A%,. ■ ■ ,A n ) — > (x\,X2, .. . ,x n ), for some n > 1, where Ai € N and 
Xi £ (NUT)*, for alH = 1, .. . ,n. If n > 2, then the production is said to be non-context-free; otherwise, 
it is context-free. In addition, if for each i = 1, . . . , n, Xi ^ A, then the production is said to be nonerasing; 
G is nonerasing if all its productions are nonerasing. 

For u, v e (N U T)*, u v in G provided that 

• u = uiA l u 2 A 2 U3...u n A n u n+ i, 

• v = u\X\U2X2Ut, . . . u n x n u n+ i, and 

• (A l ,A 2 ,...,A n ) -> (xi,X2,...,x n ) G P, 

where Uj € (N U T)*, for alH = 1, . . . ,n + 1. The language generated by G is defined as 

L(G) = {w £T* : S ^* w}, 

where =^>* denotes the reflexive and transitive closure of the relation =^>. A language L is said to be a 
(nonerasing) scattered context language if there is a (nonerasing) scattered context grammar G such that 
L = L(G). 
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3 Main results 

First, we give a simple example of a nonerasing scattered context grammar generating a non-context-free 
language. Then, we present a nonerasing scattered context grammar generating the nontrivial context 
sensitive language {a : n > 0}, for any k,l > 2. Thus, for k = I = 2, we have a scattered context 
grammar generating the language mentioned in the introduction. Note that independently on k and I, the 
grammar has only twelve nonterminals and fourteen productions, ten of which are non-context-free. 

Example 1. Let G = ({S,A,B,C},{a,b,c},P,S) be a scattered context grammar with P containing 
the following productions 

• (S)^(ABC) 

• (A,B,C) -> {aA,bB,cC) 
. (A,B,C)^(a,b,c) 

Then, it is not hard to see that the language generated by G is 

L{G) = {a n b n c n :n> 1}. o 

Lemma 2. For any k, I > 2, the language {a : n > 0} is a nonerasing scattered context language. 

Proof: Let G = ({S,A,A',A",B,C,X,X 2 ,X 3 ,Y,Z,Z'},{a},P,S) be a nonerasing scattered context 
grammar with P containing the following productions: 

1. (S) - (a 1 ), 

2. (S)-(e/), 

3. {S)^(a lk2 ), 

4. (S) -» (A"A l - l X 2 B k2 - 3 A'C k2 - l XY), 

* first stage 

5. (A',C,X,Y) -» (B k -\A',X,C k Y), 

6. (A',X,Y) -» (B k ~\A',C k ~ l XY), 

7. (A',X,Y)^(Z,Z,Y), 

8. (Z,C,Z,Y)^(Z,B k -\Z,Y), 

9. (Z,Z,Y)^(B,B k ~\X 3 ), 

* second stage 

10. (A",A,X 2 ,X 3 ) -> (a , - 1 ,A",X 2 A , ,X 3 ), 

11. (A",X 2 ,S,X 3 ) -> (a / - 1 ,^",^- 1 X 2 ,X 3 ), 

12. (A",X 2 ,X 3 )^(Z',Z',X 3 ), 

13. (Z',A,Z',X3) -> (^ / ,a'- 1 ,Z',X 3 ), 

14. (Z',Z',X 3 ) -» (a,a'- x ,a'- x ). 
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Then, all the possible successful derivations of G are summarized in the following (strings in the square 
brackets are regular expressions describing the productions applied during the derivations). 



s = 


* a 1 




[©] 


S -- 


jk 

=>• a 1 




[©] 


s = 


l k2 
=>■ or 




[©] 


s = 


* A" A 1 


- l X 2 B k2 ~ 3 A'C k2 - l XY 


[©] 




A" A 1 




[(© + ©)*©® 




* i k "- 


l - l A"A lkn - 1 - l X 2 X 3 


[(GD> + (HD)*] 




=>* a 1 







The first three cases are clear. In the last case, / symbols A (including A") are generated in the first 
derivation step. Then, the derivation can be divided into two parts: in the first part, only productions 
from the first stage are applied (because there is no X 3 in the sentential form) generating k n auxiliary 
symbols (-B's, X 2 , and Xj). Then, in the second part, only productions from the second stage are applied 
(because there is no Y in the sentential form) generating Z fc ™ symbols a. More precisely, we prove that 
all sentential forms of a successful derivation containing X3, i. e. of the second part, are of the form 

a im - 1 - l A"A im - l - 1 X 2 B kn - m X 3 , 

for all m = 2, 3, . . . , k n and n > 3. Clearly, for m = 2, the sentential form is A" A l ~ l X 2 B kn ~ 2 Xj,. For 
m = k n , we have a lk l ~ l A"A lk '^'J^X} and it is not hard to prove that 

a lkn - l - l A"A lkn - l ^X 2 X, ^* a^-^-^-^a^a 1 -^^". 
Thus, assume that 2 < m < k n . Then, 

a tm - l - l A"A im - 1 - 1 X 2 B kn - m X 3 



a ™- l -l a (l-m™- l -l) A » X2A Kl™- l -l) B k"-rn X3 [m 



a im - 2l+l A"X 2 A im - l B kn ~ m X 3 



a im - 2l+l a l - x A"A im - l A l ~ l X 2 B kn - m - x X 3 [CD] 
= a lm - l A"A lm ~ l X 2 B kn ~( m+ ^X 3 . 

For a complete proof of the correctness of this construction, the reader is referred to JH. □ 
Now, we prove the main result of this paper. 

Theorem 3. Every recursively enumerable language is generated by a scattered context grammar with 
three nonterminals, where no more than nine nonterminals are simultaneously rewritten during one 
derivation step. 

Proof: Let L be a recursively enumerable language. Then, by Geffert ID, there is a grammar 
G' = ({S',A,B,C,D},T,PU{AB ^X,CD^ A}, S'), where P contains only context-free produc- 
tions of the following three forms: S' -> uS'a, S' -> uS'v, S' -> A, for u G {A,C}*, v G {B,D}*, and 
a G T. In addition, Geffert proved that any successful derivation of G' is divided into two parts: the first 
part is of the form 

S' =^»* w\S'w 2 w wiw 2 w, 
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generated only by context-free productions from P, where w\ G {A,C}*, W2 G {5,5}*, and w G T*, 
and the second part is of the form 

W\W2W =>* W , 



genera 
Le 

1. 

2. 

3. 

4. 

5. 

6. 

7. 



ed only by productions AB — > A and CD — > A. 

G = ({S,A,B},T,P,S) be a scattered context grammar with P constructed as follows: 
5) -> (SBBASABBSA), 

5, 5, 5) -» (5, h(u)Sh(a) , 5) if 5' -» u5'a G P', 
5,5,5) -» (S,h(u)Sh(v),S) if 5' -» uS'w G P', 
S,A,B,B,S,B,B,A,S) (A, A, A, 5, 5, 5, A, A, A), 
S,B,A,B,S,B,A,B,S) -> (A, A, A, 5, 5, 5, A, A, A), 
5,5,5, A, 5, A,B,B,S)^ (A, A, A, SBBA, 5, 5, A, A, A), 
S,B,B,A,S,A,B,B,S) -> (A, A, A, 5, 5, 5, A, A, A), 
S,S,S,A) -» (A, A, A, A), 



where ft is a homomorphism from ({4,B,C,D}UT)* to ({A,B}UT)* defined as ft(A) = .455, 
ft(5) = 55A, ft(C) = ft(5) = BAD, and ft(a) = y4a55, for all a G T. 

To prove that L(G') C L(G), consider a successful derivation of ro G T* in G'. Such a derivation is 
of the form described above, where the second part of the derivation is according to a sequence pip2 • • -Pr 
of productions v!5 — > A and C5 — > A, for some r > 0. Then, in G, the derivation of w can be simulated 
by applications of the corresponding productions constructed above as follows: 

5 => SBBASABBSA [©] 

^* 555^/i(wi)5ft(u;2u;)A555A [©*©*] 
^* Sh(wi)Sh(w2)SwA [©*©] 
=4** SSSwA [q r ---q2Q\] 
=> w [©], 

where, for each 1 < z < r, 

_ ( (S,A,B,B,S,B,B,A,S) -» (A, A, A, 5, 5, 5, A, A, A), if = A5 — > A, 
% " \ (5, 5, A 5, 5, 5, A, 5, 5) -» (A, A, A, 5, 5, 5, A, A, A), otherwise. 

On the other hand, to prove that L(G) C L(G'), we demonsrate that G' generates any x G L(G). 

First, we prove that each of the productions (Q]) and © is applied exactly once in each successful 
derivation of G. To prove this, let 5 =>* x be a derivation of a string x G ({5, A, 5} U T)*. Let i be the 
number of applications of production ©, j be the number of applications of production ©, and 2k be 
the number of 5's in x. Then, it is not hard to see that 

• \x\b = 2k, 

• \x\a = k + i-j, 

• \x\s = 1 + 2i — 3j. 
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Thus, for x G T*, we have that 2k = and i = j. In addition, 1 + 2i — 3i = implies that i = l, which 
means that each of the productions © and © is applied exactly once in each successful derivation of 
G — production © as the first production and production © as the last production of the derivation. We 
have shown that every successful derivation of G is of the form 

S => SBBASABBSA =^>* w\Sw2Swt,SwhA => 101102103104 , 

for some terminal strings 10 1,102,103,104 G T*. 

Furthermore, there is no production that can change the position of the middle symbol S. Therefore, 
with respect to productions of G, we have that 101,102 G {A,B}*, which along with 101,102 G T* implies 
that w\=W2 = A. Thus, the previously shown successful derivation is of the form 

S => SBBASABBSA ^* SSw 3 Sw 4 A => u> 3 W4 • 

Analogously, it can be seen that 103 G {B AB , B B A, AaB B : a G T}*. Therefore, from the same reason 
as above, 103 = A and every successful derivation of G is of the form 

S => SBBASABBSA ^* SSSwA w , (1) 

for some w G T*. 

Consider any inner sentential form of a successful derivation of G. Such a sentential form is a string 
of the form 

u 1 5^2 Su-i Su 4 A, 

for some Ui E ({A, B}UT)* , 1 < i < 4. However, it is not hard to see that u\ = A and 114 G T*; otherwise, 
if there is a nonterminal symbol appearing in the string U1U4, then, according to the form of productions, 
none of these symbols can be removed and, therefore, the derivation cannot be successful. Thus, every 
inner sentential form of any successful derivation of G is of the form 

SuSvSwA, (2) 

where u G (BBA + \){ABB,BAB}*, v G {BAB,BBA, AaBB : a G T}*, and w G T*. Now, we 
prove that 

v G {BB A, BAB}* {AaBB : a G T}*(ABB + A) . 

In other words, we prove that any applications of productions © and (O precede the first application of 
any of productions (0]) and ©. 

Thus, consider the beginning of a successful derivation of the form 

S SBBASABBSA SBBAuSvABBSwA, 

where none of productions © and © has been applied, and the first application of one of these produc- 
tions follows. Note that during this derivation, only productions (Q]) to © have been applied because the 
application of production ©or © skips some nonterminal symbols and, therefore, leads to an incorrect 
sentential form (see the correct form © above). Clearly, w = A G T* (it is presented here for the reason 
of induction). 

If production © follows, the derivation proceeds 



SBBAuSvABBSwA SBBAuSvSwA, 



(3) 
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and if production © follows, the derivation proceeds 

SBBAuSvABBSwA SuSvSwA. (4) 

In addition, to G T* and, according to the form of productions £[]) to ©, u G {ARB, .BAB}* and 
v G {BBA,BAB,AaBB : a G T}*. 

Now, productions © and ([3]) can be applied. Let 

SBBAuSvSwA ^* SBBAuuySv^vSwA [(©+©)*] (5) 

and 

SuSvSwA ^* SumSvivSwA [(©+(11)*] (6) 

be the longest parts of the derivation by productions © and ©, i. e., the application of one of productions 
©to © follows. 

I. In the first case, derivation ([5]), each of productions ©, ©, and © leads to an incorrect sentential 
form. Thus, either production © or © has to be applied. In both cases, however, v\v has to be of the 
form v'AaBB, for some a G T, i. e., 

SBBAu Ul Sv'AaBBSwA SBBAu'Sv'SawA [©] (7) 

and the derivation proceeds as in ([5]) or 

SBBAumSv'AaBBSwA Su'Sv'SawA [©] (8) 

and the derivation proceeds as in ([6]) because 

u' = uu x G {ABB, BAB}* and v G {BBA, BAB,AaBB : a G T}* . 

By induction, 

SBBAu'Sv'SawA ^* Su"Sv"Sw"awA [(©+©+©)*©] , (9) 

for some u" G {ABB, BAB}*, v" G {BBA, BAB, AaBB : a G T}*, and w"aw G T* . 

II. In the second case, derivation ([6]), each of productions © and © leads to an incorrect senten- 
tial form, and production © finishes the derivation, which, as shown above, implies that uu\ =v\v = \. 
Thus, assume that either production © or production © is applied. Then, in the former case, 
uu\ = ABBv! and v\v = v 'BBA, and, in the latter case, uu\ = BABu' and v\v = v' BAB, i. e., 

SABBu'Sv'BBASwA Su'Sv'SwA [©] (10) 
and the derivation proceeds as in ([6]) or 

SBABu'Sv'BABSwA => Su'Sv'SwA [©] (11) 
and the derivation also proceeds as in ([6]) because 

u' G {ABB, BAB}* and v G {BBA, BAB, AaBB : a G T}*. 
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Notice that the application of a production constructed in © would lead, in its consequence, to an 
incorrect sentential form because the derivation would reach one of the following two forms 

SABBxSyAaBBSzA or SBABxSyAaBBSzA, 

and each of productions © and © would move either A in front of the first S, or at least one B behind 
the last S. By induction, it implies that the successful derivation proceeds as 

Su'Sv'SwA ^* SSSwA^w [(©+©+©)*©]. (12) 

Thus, we have proved that the following sequence of productions 

(©+©)(©+©)*(©+©) 

cannot be applied in any successful derivation of G. Therefore, all applications of productions © and © 
precede any application of productions © and ©, which means that 

v G {B B A, B AB}* {AaB B : a G T}*(ABB + A) . 

Finally, by skipping all productions © and © in the considered successful derivation S w, we 
have 

S =>- SBBASABBSA [©] 

^* SuSvSwA [(©+©+©)*©©*] 
=4> uvw [©], 

where u G {ABB, BAB}*, v G {BBA,BAB}*, u = v R (see II), and w G T*. It is not hard to see 
that by applications of the corresponding productions constructed in © and ©, ignoring productions 
© and ©, and applying S' — > A immediately after the last application of productions constructed in ©, 
we have that 5" =>* W\W2W in G' , where w\ G {A,C}* and w 2 G {B,D}* are such that h{w\) = u 
and h{w2) =v. As u = v R , we have that W1W2W =^>* w by productions Ai? -h> A and CD — > A, which 
completes the proof. □ 



4 Conclusion 

This section summarizes the results and open problems concerning the descriptional complexity of scat- 
tered context grammars known so far. 

One-nonterminal scattered context grammars: It is proved in [8 ] that scattered context grammars 
with only one nonterminal (including erasing productions) are not able to generate all context sensitive 
languages. However, because of the erasing productions, it is an open problem whether they can generate 
a language which is not context sensitive. 

Two-nonterminal scattered context grammars: As far as the authors know, there is no published 
study concerning the generative power of scattered context grammars with two nonterminals. 

Three-nonterminal scattered context grammars: In this paper, we have shown that scattered con- 
text grammars with three nonterminals, where no more than nine nonterminals are simultaneously rewrit- 
ten during any derivation step, characterize the family of recursively enumerable languages. However, 
no other descriptional complexity measures, such as the number of non-context-free productions, are 
limited in this paper. 
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Note that Greibach and Hopcroft [2] have shown that every scattered context grammar can be trans- 
formed to an equivalent scattered context grammar where no more than two nonterminals are simulta- 
neously rewritten during any derivation step. This transformation, however, introduces many new non- 
terminals and, therefore, does not improve our result. Thus, it is an open problem whether the maximal 
number of nonterminals simultaneously rewritten during any derivation step can be reduced to two in 
case of scattered context grammars with three nonterminals. 

Finally, it is also an open problem whether the number of non-context-free productions can be lim- 
ited. 

Four-nonterminal scattered context grammars: It is proved in [5] that every recursively enu- 
merable language can be generated by a scattered context grammar with four nonterminals and three 
non-context-free productions, where no more than six nonterminals are simultaneously rewritten during 
any derivation step. In comparison with the result of this paper, that result improves the maximal number 
of simultaneously rewritten symbols and limits the number of non-context-free productions. On the other 
hand, however, it requires more nonterminals. 

Five-nonterminal scattered context grammars: It is proved in [12] that every recursively enumer- 
able language can be generated by a scattered context grammar with five nonterminals and two non- 
context-free productions, where no more than four nonterminals are simultaneously rewritten during any 
derivation step. Note that this is the best known bound on the number of non-context-free productions. 
It is an interesting open problem whether this bound can also be achieved in case of scattered context 
grammars with three nonterminals. 

Scattered context grammars with one non-context-free production: In comparison with the pre- 
vious result, it is a natural question to ask what is the generative power of scattered context grammars 
with only one non-context-free production. However, as far as the authors know, this is another very 
interesting open problem. 

Nonerasing scattered context grammars: So far, we have only considered scattered context gram- 
mars with erasing productions. However, the most interesting open problem in this investigation area is 
the question of what is the generative power of nonerasing scattered context grammars. It is not hard 
to see that they can generate only context sensitive languages. However, it is not known whether non- 
erasing scattered context grammars are powerful enough to characterize the family of context sensitive 
languages. 

Finally, from the descriptional complexity point of view, it is an interesting challenge for the future 
research to find out whether some results similar to those proved for scattered context grammars with 
erasing productions can also be achieved in case of nonerasing scattered context grammars. 
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